Eecient Methods for Dealing with Missing Data in Supervised Learning

نویسنده

  • Volker Tresp
چکیده

We present eecient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks. For training, we show how the backpropagation step for an incomplete pattern can be approximated by a weighted averaged backpropagation step. The complexity of the solutions for training and recall is independent of the number of missing features. We verify our theoretical results using one classiication and one regression problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Methods for Dealing with Missing Data in Supervised Learning

Subutai Ahmad Interval Research Corporation 1801-C Page Mill R<;l. Palo Alto, CA 94304 We present efficient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks...

متن کامل

Missing or Inapplicable: Treatment of Incomplete Continuous-valued Features in Supervised Learning

Real-world data are often riddled with data quality problems such as noise, outliers and missing values, which present significant challenges for supervised learning algorithms to effectively classify them. This paper explores the ill-effects of inapplicable features on the performance of supervised learning algorithms. In particular, we highlight the difference between missing and inapplicable...

متن کامل

Imputation of Missing Data Using Machine Learning Techniques

A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine le...

متن کامل

Missing Data Imputation for Supervised Learning

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on ...

متن کامل

Techniques for Dealing with Missing Data in Knowledge Discovery Tasks

Information plays a very important role in our life. Advances in many research fields depend on the ability of discovering knowledge in very large data bases. A lot of businesses base their success on the availability of marketing information. This kind of data is usually big, and not always easy to manage. Scientists from different research areas have developed methods to analyze huge amounts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995